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Abstract 


We present a Discriminative Switching Lin¬ 
ear Dynamical System (DSLDS) applied to 
patient monitoring in Intensive Care Units 
(ICUs). Our approach is based on identi¬ 
fying the state-of-health of a patient given 
their observed vital signs using a discrim¬ 
inative classifier, and then inferring their 
underlying physiological values conditioned 
on this status. The work builds on the 
Factorial Swit ching Linear Dyna mical Sys¬ 
tem (FSLDS) ( Quinn et ai . 2009ll which has 
been previously used in a similar setting. 
The FSLDS is a generative model, whereas 
the DSLDS is a discriminative model. We 
demonstrate on two real-world datasets that 
the DSLDS is able to outperform the FSLDS 
in most cases of interest, and that an a- 
mixture of the two models achieves higher 
performance than either of the two models 
separately. 


Condition monitoring of patients in intensive care 
units (ICUs) based on vital signs (e.g. heart rate, blood 
pressure) is of critical importance, as they can be sub¬ 
ject to a number of serious physiological events such 
as bradycardia and hypotension. However, a variety 
of artifactual processes can “contaminate” the data, 
e.g. the taking of blood samples, performing suctions, 
recalibrating sensors, etc. These artifactual processes 
complicate the task of identifying the important physi¬ 
ological events and are the main source of false alarms 
in ICUs. Moreover, it is of interest to maintain be¬ 
liefs about the true physiological values of a patient 
when these cannot be directly observed due to arti¬ 
fact. For example, it would be desirable to display 
the patient’s estimated blood pressure, when the cor¬ 
responding measuring device has been disconnected or 
is otherwise displaying artifactual values (as is the case 
during a blood sample event). Of course, this estimate 


should be clearly distinguishable from the raw data 
(e.g. by using a different display colour). 

One approach to this problem is to build a latent vari¬ 
able model, using a number of discrete latent vari¬ 
ables to model the physiological and artifactual events 
through time, and a linear dynamical system (LDS) 
conditional on these discrete variables to model the 
associated dynamics in the vital signs observations. 
This is the f actori al switching LDS (or FSLDS) of 
Quinn et all ( 2009l l. However, we have noticed that 
in building such systems it is necessary to construct 
quite detailed models of the artifactual events in order 
to capture them properly. This can be non-trivial since 
some of these events can be highly variable, which is 
hard to capture with a generative model. Despite this 
high variability, the vital signs can still contain infor¬ 
mative features which could act as input to a discrim¬ 
inative model. Thus, if it is possible to build such a 
model that can fairly easily distinguish between the 
various events, then it would seem simpler and eas¬ 
ier to make the discrete-state inference be discrimi¬ 
native, and use FSLDS-style inference for the continu¬ 
ous latent variables conditional on the inferred discrete 
state. We call this a discriminative switching linear 
dynamical system (DSLDS). In this paper we compare 
the FSLDS and DSLDS models on two ICU condition 
monitoring datasets. The results show that using the 
DSLDS gives increased performance in most cases of 
interest, and that an a-mixture of the two methods 
was able to achieve a higher performance than either 
of the two models separately. 

To summarise, our goal is to build a model with in¬ 
creased performance for the following tasks: 


• Identifying artifactual processes (e.g blood sam¬ 
ples), which will reduce the high false alarm rate 
in ICUs and facilitate the task of identifying phys¬ 
iological processes. 

• Identifying physiological processes which can be 
of critical importance (e.g bradycardias). 

















• Providing an estimate of a patient’s true physio¬ 
logical values when these are obscured by artifact. 

The structure of the remainder of the paper is as fol¬ 
lows: in Section [I] we give a description of our pro¬ 
posed model and compare its graphical structure and 
inference methods to those of the FSLDS, and briefly 
describe related work. In Section[2]we describe our ex¬ 
periments and provide results for the comparison be¬ 
tween the DSLDS and the FSLDS. Finally, in Section|3] 
we conclude with general remarks about our proposed 
model and suggestions for future work. 


1 Model description 


The graphical model of the FSLDS is depicted in Fig¬ 
ure [I](top). It operates on three different sets of vari¬ 
ables: The observed variables, yt G represent 
the patient’s vital signs obtained from the monitor¬ 
ing devices at time t, which act as the input to our 
model. The continuous latent variables, xj G 
track the evolution of the dynamics of a patient’s un¬ 
derlying physiology. The discrete variable, St, repre¬ 
sents the switch setting or regime which the patient 
is currently in (e.g. stable, a blood sample is being 
taken etc. ). The switch variable can be factorised 
according to the cross-product of M factors, so that 
St = . Each factor variable, Z™, is usu¬ 

ally a binary vector indicating the presence or absence 
of a factor, but in general it can take on different 
values and K = 0™=! is the total number of pos¬ 
sible conhgurations of the switch variable, St- Also, St 
depends explicitly on the previous time step, so that 
p{st\st-i) = 11^=1 P(/rl/t-i)- Conditioned on a par¬ 
ticular regime, the FSLDS is equivalent to an LDS. 
The FSLDS can be seen then as a collection of LDS’s, 
where each LDS models the dynamics of a patient’s 
underlying physiology under a particular regime, and 
can also be used to generate a patient’s observed vital 
signs. An LDS provides a generative framework for 
modelling our belief over the state space, given obser¬ 
vations. 


We can alternatively adopt a discriminative view. We 
start by modelling p{st\yt-i:t+r) with a discrimina¬ 
tive classifier, where (features of) observations from 
the previous I and future r time steps affect the be¬ 
lief of the model about st- The inclusion of r frames 
of future context is an alogous t o fixe d-lag smoothing 
in an FSLDS (see e.g. Sarkka . 20131 sec. 10.5). We 
note that inclusion of future observations in the con¬ 
ditioning set means that the DSLDS will operate with 
a delay of r seconds, since an output of the model at 
time t can be produced only after time t + r. Pro¬ 
vided that r is small enough (r <10 in experiments), 
this delay is negligible compared to the increase in 


performance. The LDS can also be regarded from a 
similarly discriminative viewpoint which allows us to 
model p(xt|xt_i, yt). This is similar to the Maximum 


Entro py Markov Model (MEMM) (jMcCallum et al. 


2 OOOII with the difference that the latent variable is 
continuous rather than discrete. The main advantage 
of this discriminative view is that it allows for a rich 
number of (potentially highly correlated) features to 
be used without having to explicitly model their dis¬ 
tribution or the interactions between them, as is the 
case in a generative model. A combination of these 
two discriminative viewpoints gives rise to the DSLDS 
graphical model in Figure [T] (bottom). The DSLDS, 
conditioned on st, can be seen then as a collection of 
MEMM’s, where each MEMM in the DSLDS plays a 
role equivalent to that of each LDS in the FSLDS. 

The DSLDS can be defined as 
p(s,x|y) =p(si|yi)p(xi|si,yi)x 

T 

nP(st|yt-i:t-Hr)p(xt|Xt-i,St,yt) . (1) 

t=2 


The simplest assumption we can make for the DSLDS 
is that p{st\yt-i:t-\-r) factorises, so that 


M 

p{st\yt-l:t+r) = Y[p{fi'^^\yt-l:t+r) ■ (2) 

m—1 

However, one could use a structured output model to 
predict the joint distribution of different factors. 

1.1 Predicting S( 

Our belief about the state of health of a patient at time 
t is modelled by p{st\yt-i-.t+r), the conditional prob¬ 
ability of the switch variable given the observed vital 
signs. Following the factorisation of the switch variable 
in eq. [H we model the conditional probability of each 
factor being active at time t given the observations 
with a probabilistic discriminative binary classifier, so 
that p{fP = l|yt-z:t+r-) = G{4){yt-i,t+r)), where G(-) 
is a classifier-specific function, and (l){yt-i-.t+r) is the 
feature vector that acts as input to our model at each 
time step as described in Section 12.11 As is evident 
from Figure [T] (bottom) there is no explicit temporal 
dependence on the switch variable sequence. However, 
temporal continuity is implicitly incorporated in our 
model through the construction of the features. 

1.1.1 An a- mixture of St 

The DSLDS model can be seen as complementary to 
the FSLDS, and they can be run in parallel. One 













1.2 Predicting x* 




Figure 1: Graphical model of the FSLDS (top) and 
the DSLDS (bottom). The state-of-health and under¬ 
lying physiological values of a patient are represented 
by St and xj respectively. The shaded nodes corre¬ 
spond to the observed physiological values, y*. Note 
that in the case of the DSLDS the conditional proba¬ 
bility p{st\yt-i-.t+r) is modelled directly. 


way of combining the two outputs is to maintain an 
a-mixture over st- If Pg{st) and Pd{st) are the out¬ 
puts for the switch variable at time t from FSLDS and 
the DSLDS respectively, then their a-mixture is given 

by: Pa(st) = c 

where c is a normalisation constant which ensures that 
Pa{st) is a probability distribution. The family of a- 
mixtures then subsumes various known mixtures of 
distributions and defines a continuum across them via 
the a parameter. For example, for a = —1 we re¬ 
trieve the mixture of experts (with equally weighted 
experts) framework, while for a —>■ 1, the formula 
yields pi(st) = c^/pg{st)pd{st), rendering it equiva¬ 
lent to a product of experts viewpoint. In general, as 
a increases, the a-mixture assigns more weight to the 
smaller elements of the mixture (with a —>■ oo giving 
Poo(st) = min{pg(st),pd(st)}), while as a decreases, 
more weight is assigned to the larger elements (with 
a -oo giving p-ooi{st) = max|p„(st), p rf(st)|) A 
thorough treatment is given in lAmaril ( 20071 1 . 


The model of the patient’s physiology should capture 
the underlying temporal dynamics of their observed 
vital signs under their current health state. The idea 
is that the current latent continuous state of a pa¬ 
tient should be dependent on (a) the latent continuous 
state at the previous time step, (b) the current state of 
health and (c) the current observed values. We model 
these assumptions as follows 


p(xt|xt_i,st,yt) oc 

exp{-i(xt-A(®*)xt_i)^(Q(®*))“^(xt-A^'’‘^xt_i)}x 

exp{-^(C(®*)xt-yt)^(R(®*))“^(C('’‘^xt-yt)} . (3) 

The first term on the RHS of eq. [3] is the system 
model for an LDS and captures the dynamics of a 
patient’s latent physiology under state St- The sec¬ 
ond term can be seen as the discriminative counter¬ 
part of the observation model of an LDS. In our 
condition monitoring setting, the observed vital signs 
are considered to be noisy realisations of the true, la¬ 
tent physiology of a patient and thus, the observation 
model encodes our belief that xj is a noisy version of 
yt- Under this assumption, O'** consists of 0/1 entries, 
which are set based on our knowledge of whether the 
observations yt are artifactual or not under state s*. 
In the FSLDS, the corresponding observation model 
encodes the belief that the generated yt should be 
normally distributed around xt with covariance R®*, 
whereas in our discriminative version, the observation 
model encodes our belief that Xj should be normally 
distributed around yt with covariance R'**. The idea 
behind this model is that at each time step we up¬ 
date our belief about Xt conditioned on its previous 
value, xt_i, and the current observation, yt, under 
the current regime St- For example, under an artifac¬ 
tual process, the observed signals do not convey useful 
information about the underlying physiology of a pa¬ 
tient. In that case, we drop the connection between 
yt and Xt (for the artifact-affected channels) which 
translates into setting the respective entries of to 
zero. Then, the latent state Xt evolves only under 
the influence of the appropriate system dynamics pa¬ 
rameters (Ai'**^ Q(®*i). Conversely, operation under 
a non-artifactual regime incorporates the information 
from the observed signals, effectively transforming the 
inferential process for xt into a product of two “ex¬ 
perts”, one propagating probabilities from Xj_i and 
one from the current observations. 

We note that the step of conditioning on the current 
regime St in order to predict xt is required for our task, 















































as we do not have training data for the x-state. Other¬ 
wise, one could imagine buildin g a simpler model suc h 
as a conditional random field (jLaffertv et al\ . 120011) . 
to predict the x-state directly from the observations. 
However, in our case, where only labels about the pa¬ 
tient’s regime are available, this is not possible. 


1.3 Learning 

We first describe learning in the general SLDS set¬ 
ting. The parameters that need to be learned are: 
{A®, Q®, C®, R®}. Given training data for each 
switch setting, these can be learned independently as 
LD S parameters for e ach configuration of s. Follow¬ 
ing Quinn et al. ( 2009l l we use an independent ARIMA 
model with added observation noise for each channel. 
Casting such a model into state space form is a stan¬ 
dard procedure as described in Brockwell and David 
( 20091 sec. 12.1), and amounts into reformulating the 
parameters of the ARIMA model into the parameters 
of a state-space model. Once the model is in state 
space form. A®, Q®, C®, R® can be fit according to 
the maximum likelihood criterion by using numerical 
optimisation methods (li ke Newton-Raphson, Gauss- 
Newton), as presented in IShumwav and Stoffeij ( 2000l 
sec. 2.6) or expectation maximisat i on (E M) as pre¬ 
sented in iGhahramani and HintonI ( 1996ll . We note 
that the vector ARMA (VARMA) representation is 
used, where for example a one-dimensional AR(p) pro¬ 
cess can be encoded as a p -I- 1-dimensional VAR(l) 
process by maintaining a latent state representation 
of the form xj = [xt Xt-i ... Xt-p\. 

In the DSLDS, the same set of parameters needs to be 
learned. As mentioned in Section ri.21 the assumptions 
for the DSLDS observation model constrain C® to be 
a binary matrix, whose values are set so as to pick 
the most recent value X( under the VARMA represen¬ 
tation. For example, assuming that we are modelling 
one channel, under a physiological regime, as an AR(2) 
process, then C® = [10 0]. Under this constrained 
form of C® we obtain the remaining parameters. A®, 
Q® and R®, using the same learning process as the one 
already described for the case of a general SLDS. 

The task of determining the order of the respective 
ARIMA models is less straightforward. We h ave fol¬ 
lowed a practical approach as suggested in Diggld 
I 1990L sec. 6.2). The autocorrelation and partial auto¬ 
correlation function (AGF and PACF respectively) of 
the stationary data (if a time series is not stationary, 
we make it stationary by successive differencing) were 
examined to provide an initial estimate of the appro¬ 
priate model order. A clear cut-off at lag g in the AGF 
plot is suggestive of an MA(q) process, while a clear 
cut-off at lag p in the PAGF plot is suggestive of an 
AR(p) process. Glear cut-offs are rare in a real world 


application, in which case we looked for less clear tail- 
offs in the PAGF and AGF plots. After establishing a 
small number of potential model orders suggested by 
these tail-offs, further exploration of the model order 
around these initial estimates was carried out by cal¬ 
culating the A kaike Information Griterion (AIG) score 
I Akaikel . Il972ll for each of these potential model or¬ 
ders, and finally the one with the smallest AIC value 
was chosen. 


1.4 Inference 


In this paper we are concerned with the task of com¬ 
puting the distribution p(st, xt|yi:t+r)- According to 
our proposed model, p(stlyt-i:t+r) can be inferred at 
each time step via a classifier as described in Section 
o However, exact inference for xj is still intractable. 
The same limitation as in the case of a standard SLDS 


applies (|Lerner and Pariv 120011) : In order to maintain 
an exact belief over the posterior distribution of Xj 
we need to keep track of all the potential combina¬ 
tions of switch variable settings that could have lead 
us from xt_i to xj, making inference scale exponen¬ 
tially with time. An approximation of this distribu¬ 
tion ca n be maintained via the Ga ussian Sum algo- 
rithrr0 ([Alspach and SorensonL 1972 ). The idea is that 
at each time step t we maintain an approximation of 
p{'Kt\st,yi:t+r) as a mixture of J Gaussians. Mov¬ 
ing one time step forward will result in the posterior 
p(xt+i|si+i,yi:i+^+i) having KJ components, which 
are again collapsed to J components. In our experi¬ 
ments we use J = 1, which translates into matching 
moments (up to second order ) of the distrib ution for 
each setting of St, as shown in Murphy ( 19981) . There¬ 
fore inference in the DSLDS can be seen as a two-step 
process, where p(st|yt_;:t+r) is inferred by our discrim¬ 
inative classifier, andp(xt|st, yi:t+r) is inferred accord¬ 
ing to the Gaussian Sum algorithm. 


1.5 Related work 


In terms of methodology, our prop osed model bear s 
some similarities to the one used by Lu et al. ( 20091 ). 
However, their model was used to model spatial re¬ 
lationships and they were only concerned with a bi¬ 
nary discrete latent space. In our case, we are con¬ 
cerned with modelling temporal structure and we have 
a richer and more complex discrete latent space. More 
importantly, in their work the distribution maintained 
over the continuous latent space is a single multivari¬ 
ate Gaussian, whereas in our model, as described in 
the previous section, the belief over the continuous la¬ 
tent space is modelled as a mixture oi KJ Gaussians. 


^The Gaussian Sum algorithm is also known as the Gen¬ 
eralised Pse^o Bayesian (GPB) algorithm as mentioned in 
iMurphvI (119981 ). 

































































This allows us to keep track of multiple modes about 
the belief over a patient’s underlying physiology, since 
this is potentially affected by multiple factors. 


In terms of applicati on, our work i s most ly similar to 
the one presented in Quinn et al\ ( 2009ll . The same 
task of inferring artifactual and physiological processes 
was considered there. However a generative approach 
was taken there via the use of an FSLDS. In our case, 
we take a discriminative approach, which performs 
better in the experim ents considered below. Also, in 


Lehman et al\ (|2014f) . a switching vector autoregres¬ 


sive model was used on minute-by-minute heart rate 
and blood pressure vital signs to provide inputs for 
a logistic regression classifier with the goal of patient 
outcome prediction. In our work, we use a more ex¬ 
pressive model, capable of modelling both discrete and 
continuous latent states under a unified framework, for 
the purposes of detecting patients’ state-of-health and 
inferring their underlying physiology. 


2 Experiments 


In this section we describe experiments on two chal¬ 
lenging datasets comprising of patients admitted to 
ICUs in two different hospitals, namely a neonatal 
ICU and an adult ICU. We emphasise that it is highly 
non-trivial to obtain annotations for medical datasets 
as it requires the very scarce resource of experienced 
clinicians. Indeed, for the adult ICU, the annotated 
data are the product of a one-year collabora tion with 
that ICU. Physionet ( Goldbererer et ai . 2000), a freely 
available medical dataset, is not suitable for our task 
since the only available time-series annotations are 
a limited set of life threatening/terminal events, for 
which identification would not be of practical use in 
the ICU. 

For both datasets, we evaluate the performance of 
the DSLDS compared to the FSLDS. We also report 
the performance of an a-mixture of the two models. 


Note t hat the FSLDS has been shown in iQuinn et al. 


( 2009l l to achieve superior results compared to more 
basic models such as a factorial hidden Markov model 
(FHMM) for the task of condition monitoring in ICUs. 
We first provide a short description of the various fea¬ 
tures that were used as input to the state-of-health 
model as described in Section o followed by an out¬ 
line of the main characteristics of the two datasets. We 
conclude this section by providing results on two tasks: 
a) inferring a patient’s state of health and b) inferring 
a patient’s underlying physiology in the presence of 
artifact corruption. 


2.1 Features &: Classifiers 


As described in Section o the estimate of St is the 
output of a discriminative classifier. F or both datasets , 
we found that using a random forest ( Breimail 2001 ) 
as our classification method yields the best perfor¬ 
mance. Suggestions for judicious selection of var- 
ious tree-construct ion parameters can be found in 
Hastie et al\ ( 20091 Ch. 15). The Gini index was used 


as the criterion for splitting nodes for each tree in the 
random forest. The output of the random forest for 
a new test point is an average of the predictions pro¬ 
duced by each tree, where the prediction of each tree is 
the proportion of the observations that belong to the 
positive class in the leaf node in which the test point 
belongs to. Apart from their high performance, an¬ 
other appealing property of random forests is that they 
can handle missing observations via the construction 
of surrogate variabl es and splits within each decision 
tree as explained in Hastie et al. ( 20091 sec. 9.2.4). 


We use a variety of features to capture interesting tem¬ 
poral structure between successive observations. At 
each time step, a sliding window of length ^ -|- r -I- 1 is 
computed. For some features we also divide the win¬ 
dow into further sub-windows and extract additional 
features from them. More precisely, the full set of fea¬ 
tures that are being used are: (i) the observed, raw val¬ 
ues of the previous I and future r time steps (yt-i-.t+r)', 
(ii) the slopes (calculated by least squares fitting) of 
segments of that sliding window that are obtained by 
dividing it in segments of length (Z -|- r -|- 1)/A:; (iii) 
an exponentially weighted moving average of this win¬ 
dow of raw values (with a kernel of width smaller than 
Z -|- r -|- 1); (iv) the minimum, median and maximum 
of the same segments; (v) the first order differences 
of the original window; and (vi) differences of the raw 
values between different channels. 


2.2 Neonatal ICU 


The fi rst dataset is the one used in iQuinn et al. 
(l2009li^ . It comprises 24-hour periods from fifteen 
neonates admitted to the ICU of the Edinburgh Royal 
Infirmary, with events of interest annotated by two 
clinical experts. These annotations include: i) blood 
sample events (BS), ii) periods during which an incu¬ 
bator is open (10), iii) core temperature probe discon¬ 
nections (TD), iv) bradycardias (HR), and v) periods 
that are clearly not stable but no further identification 
was made by the clinicians (X). These last cases can 
be collectively considered as a “none-of-th e-above” fac¬ 
tor, w hich is referred to as the X-factor bv lOuinn et al. 


( 2009l l. More details about the events of interest can 


^The dataset has been anonymised and is available at: 
www.cit.mak.ac.ug/stajf/jquinn/software.html 










































be found in the aforementioned work. We used the 
same parameters for the underlying physiology model 
as the ones used there. 


2.3 Adult ICU 


The second dataset comprises data collected from nine 
adults admitted to the neuro ICU of the Southern Gen¬ 
eral Hospital in Glasgow. An average of 33-hour peri¬ 
ods were collected from each of these patients, consist¬ 
ing of measurements recorded on a second-by-second 
basis for four different channels: heart rate (HR), sys¬ 
tolic and diastolic blood pressure (BPsys, BPdia), and 
systolic intracranial pressure (ICPsys). These data 
were then annotated by a clinical expert. We give 
a brief description of the learning process for stabil¬ 
ity periods and modelled factors, which include blood 
samples, damped traces (DT), suction events (SC), 
and the X-factor. 


Stable periods correspond to time periods when 
no annotation occurred from the experts, suggest- 
ing that the patient is i n a s table condition. In 
IWilliams and Stanculescu ( 20Ilh it was found that in 
a similar setting a 15 minute period of stability pro¬ 
vides an adequate amount of training data. We use the 
same time interval for our experiments. We found that 
ARIMA(2,1,0) models were adequate for all channels. 

An example of a blood sample is shown in Figure 
m (bottom). Changes in BPsys and BPdia can be 
modelled as a four-stage process: i) the blood is di¬ 
verted to a syringe for blood sampling, which causes 
an artifactual ramp in the observed measurements. 
This is similar to th e blood sample model described in 
Quinn et al. ( 2009ll and we follow the same approach 


here, ii) A recalibration stage follows, causing mea¬ 
surements to drop to zero whi ch can be modelled sim¬ 
ilarly to a dropout event as in Quinn et al. ( 2009ll . hi) 
BP measurements continue as a stable period for a 
brief period, iv) The blood sample is concluded with 
a flushing event for hygiene purposes which causes a 
sharp increase in measurements. This stage is mod¬ 
elled as an AR(3) process for both the BPsys and 
BPdia channels. A total number of 64 blood sample 
events have been annotated, with an average duration 
of 1.6 minutes. 


During a suction event, a flexible catheter is inserted 
into the airway of the patient to remove secretions that 
have accumulated over time in their pulmonary sys¬ 
tem. This event is observed as a significant increase in 
the values of all observed channels. An AR(2) process 
models the HR channel, while AR(3) processes were 
used to model the remaining channels. A total num¬ 
ber of 53 suction events have been annotated, with an 
average duration of 4.3 minutes. 


A damped trace, an example of which is shown 
in Figure |4] (top), is usually observed due to blood 
residues being accumulated in the line used for mea¬ 
suring the blood pressure channel, which leads both 
BPsys and BPdia to converge to a similar mean value 
while at the same time the measurements exhibit high 
variability. Both channels were modelled with AR(3) 
processes. A total number of 32 damped trace events 
have been annotated, with an average duration of 14 
minutes. 


Except for the aforementioned factors which we ex¬ 
plicitly model, there are a multitude of other factors 
present in our training data, corresponding to either 
known but not yet modelled factors (such as hygiene 
events, tachycardias etc.) or to unknown factors (clear 
abnormalities which however have not been identified 
by the clinicians). We collectively treat those events 
as unknown and m odel them according to the X-factor 
model proposed in iQuinn et all (loO^. A total num¬ 
ber of 278 X-factor events have been annotated, with 
an average duration of 7.5 minutes. Channels which 
are unaffected by an artifactual process (as shown in 
Table [1]) are modelled as in the stable case. In every 
case the parameters of the x-state models were further 
optimised by EM. 


Table 1: Channels affected by different processes for 
the adult ICU are marked by •. 


HR BPsys BPdia ICPsys 


Blood sample 
Damped trace 
Suction • 
X-factor • 


• • 

• • 

• • • 

• • • 


Table 2: Comparison of DSLDS, FSLDS and a- 
mixture performance for the Neonatal ICU dataset. 
Qptimal value of the a parameter is shown inside 
parenthesis. 


AUC 

BS 

IQ 

TD 

BR 

X 

DSLDS 

0.98 

0.83 

0.90 

0.94 

0.57 

FSLDS 

0.92 

0.87 

0.88 

0.85 

0.66 

a-mixture^°®^ 

0.98 

0.89 

0.93 

0.92 

0.67 


2.4 Results 

For both datasets we compare the performance of the 
DSLDS and the FSLDS for the task of inferring a pa¬ 
tient’s state of health. We measure the performance of 
the models by reporting the Area under the Receiver 





























Operating Characteristic curve (AUC). Also, in Fig¬ 
ures [2] and [3l we provide plots of the Receiver Operat¬ 
ing Characteristic curves (ROC) for the classification 
of the factors of interest comparing the DSLDS, the 
FSLDS, and an a-mixture of the two models. 

In the case of the DSLDS, the features described in 
Section o involve a number of hyperparameters that 
need to be chosen. Fitting them with a standard cross- 
validation (CV) scheme when data are not abundant 
p oses a non-negligible risk of overfitting. As is shown 
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Varma and Simon ( 2006ll . using CV to evaluate per¬ 


formance of a model when the model’s hyperparame¬ 
ters have been themselves tuned using CV can lead 
to an optimistic bias of the estimate of the true per¬ 
formance. In that same work, a nested CV approach 
is shown to yield an almost unbiased estimate of the 
true performance, which we also follow in our experi¬ 
ments. In the outer loop the data are partitioned into 
P disjoint test sets. After choosing one of these par¬ 
titions, the rest of the data are used in the inner loop 
in a standard CV setup to select the hyperparameters. 
The hyperparameters which yielded the highest per¬ 
formance (average cross-validated AUC across factors 
in our case) in the inner loop are then used to esti¬ 
mate the performance of the model on the partition 
(test set) in the outer loop. This process is repeated 
P times, once for each partition in the outer loop. For 
both datasets, we use leave-one-patient-out CV for the 
inner loop and 3-fold CV for the outer loop. In the in¬ 
ner loop, we perform a grid search over hyperparame¬ 
ters in the following sets: a) number of trees of random 
forest classifiers in {10, 25, 50, 100, 200}; b) I in (4, 9, 
14, 19, 29, 49}; c) r in {0, 5, 10}. The sub-segments 
lengths (for slope features) were always set to max{5, 
(l-|-r’-|-l)/5} and the kernel widths (for moving average 
features) were always set to max{5, (/ -I- r -|-1)/5}. 

In the case of the FSLDS, it is not necessary to follow 
the same procedure. Using the AIC score, as shown in 
Section lOl for choosing the orders of the ARIMA pro¬ 
cesses (which constitute the model’s hyperparameters) 
avoids potential overfitting by penalising the model’s 
likelihood as the parameters grow. We therefore use 
3-fold CV to evaluate the FSLDS’s performance. 

To evaluate the a-mixture model, we have chosen the 
optimal a value as the one that maximises the average 
AUC across factors, via 3-fold CV. This also allowed 
us to explore the behaviour of the model as a function 
of a for both datasets. 

2.4.1 Neonatal ICU 

In the case of the neonatal ICU we compare the two 
m odels on the full se t of annotated factors reported 
in Quinn et al. (20^. The results are shown in Ta¬ 
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Figure 2: ROC curves per modelled factor in the case 
of the neonatal ICU. 


ble [20. The DSLDS outperforms the FSLDS in three 
out of the four clinically identified factors. The differ¬ 
ence in favour of the DSLDS is clear for bradycardias 
and blood samples, but less pronounced for core tem¬ 
perature disconnections. The FSLDS achieves slightly 
higher performance in the case of the incubator open 
factor, and clearly outperforms the DSLDS in the case 
of the X-factor. The FSLDS models the presence of 
outliers by the inclusion of an extra factor, which is 
essentially governed by the same parameters as sta¬ 
bility with the only difference being that the system 
noise covariance is an inflated version of the respective 
cov ariance of the stab ility dynamics (for more details, 
see 


Quinn et aZ.l . l2009ll . Such an approach has the po¬ 


tential to address the issue of outlier detection in a 
more general and thus more satisfactory way. In the 
case of the DSLDS, our approach is to collectively treat 
all abnormal events, other than the ones attributed to 
known factors, as an “X-class” and build a binary clas¬ 
sifier to distinguish that class. As the training data- 


^The FSLDS result s were obtained using code provided 
by iQuinn et al\ (120091 1 with the same parameters as the 
ones mentioned there. The results are very close with the 
exception of the core temper ature disconnection factor (for 
which the reported AUC in iQuinn et al\ (120091 1 was 0.79, 
while we obtained a value of 0.88), an d the blood s ample 
factor (for which the reported AUC in IQuinn et al\ (l2009ll 
was 0.96, while we obtained a value of 0.92). 




























































Table 3: Comparison of DSLDS, FSLDS and a- 
mixture performance for the Adult ICU dataset. Op¬ 
timal value of the a parameter is shown inside paren¬ 
thesis. 


AUC 

BS 

DT 

SC 

X 

DSLDS 

0.96 

0.93 

0.67 

0.65 

FSLDS 

0.95 

0.79 

0.57 

0.74 

a-mixture^*^^ 

0.99 

0.94 

0.70 

0.71 


points for this class are highly inhomogeneous in terms 
of shared discriminative features, and test points be¬ 
longing to the X-class may not exhibit a high degree 
of similarity to the training set, it is not surprising 
that the DSLDS may perform rather poorly for the 
X-factor. However, by considering an a-mixture of 
the two models, we can combine the discriminative 
power of the DSLDS for known factors with the in¬ 
creased performance of the FSLDS for the X-factor, 
thus achieving a higher performance (bottom line of 
Table [2|) compared to considering the two models sep¬ 
arately. The behaviour of the a-mixture model as a 
function of a is shown in Figure [5] (top). The opti¬ 
mal a-mixture (a = 0.5) yields the best average AUC 
across factors (in fact, a = 0.5 yields optimal perfor¬ 
mance for each factor separately except bradycardia, 
where it is almost optimal) compared to all other con¬ 
sidered a values and also outperforms the DSLDS and 
the FSLDS in all cases except for the bradycardia fac¬ 
tor, where the DSLDS performs slightly better. 
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Figure 3: ROC curves per modelled factor in the case 
of the adult ICU. 
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Figure 4: Example of DSLDS and FSLDS inferences 
for a damped trace event (top) and a blood sample 
event (bottom). 


2.4.2 Adult ICU 

In the case of the adult ICU, inferences for two exam¬ 
ple events are shown in Figure 01 In the top, a damped 
trace event is shown, which lasts for almost one hour 
before being resolved by a flushing event (spiking of 
both channels). The DSLDS accurately identifies the 
damped trace event, while the FSLDS fails totally to 
detect it, but hypothesises several incorrect blood sam¬ 
ple events. In the bottom panel a blood sample event 
is shown, where the multiple stages are clearly visible. 
The event starts with two artifactual ramps, followed 
by a flushing, a zeroing, and finally with another flush¬ 
ing. This is slightly different than the description we 
have already given, but slight deviations from the stan¬ 
dard protocol due to human error is to be expected. 
In this case, both models manage to capture the event 
in a generally satisfactory manner. Summary results 
are reported in Table [3l The DSLDS outperforms the 
FLSDS on all of the known factors. The damped trace 
and suction events particularly are characterised by 
high variability which is hard to capture with a gener¬ 
ative process. However, simple discriminative features 
are able to capture them with higher accuracy. As was 
expected, the FSLDS achieves a higher AUC for the X- 
factor. Again, the optimal a-mixture (a = 0) outper¬ 
forms the DSLDS and the FSLDS in all cases except 
for the X-factor, where the FSLDS achieves a slightly 
higher AUC. Contrary to the neonatal ICU dataset, 
as shown in Figure [5] (bottom) there are alternative a 
values which can yield higher AUC across different fac- 
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Figure 5: Performance of the a-mixture models as a 
function of a {step = 0.25) for the Adult ICU (top) 
and the neonatal ICU dataset (bottom). The asterisk 
marks the optimal value for a. 


tors. For example, an X-factor AUC value of 0.76 can 
be obtained by setting a = 5. However, apart from the 
superior (on average) performance of the a-mixture, 
another appealing property is that a could be treated 
as a user-tunable parameter. In a practical setting, the 
model could be preset with the optimal a value, but a 
clinician could decide, for example, to make the model 
focus on maximising its predictive performance on the 
X-factor (or some important physiological factor like 
bradycardia) to the potential detriment of other fac¬ 
tors. Then the model could adjust its a parameter in 
real-time based on training data results to maximise 
its performance on the desired factor. 

2.4.3 Inference for x-state 

Finally, Figure |6] shows the inferred distribution of un¬ 
derlying physiology during a blood sample taken from 
a neonate for both models. In both cases, estimates 
are propagated with increased uncertainty under the 
correctly inferred artifactual event. Note a small dif¬ 
ference at the start of the event: The DSLDS partially 
identifies the event causing an increase in uncertainty, 
while the FSLDS (incorrectly) identifies this part as 


stable and thus its x-state update exhibits lower uncer¬ 
tainty. Maintaining an estimate of the underlying vital 
signs in the presence of artifacts can then be used for 
data imputation. Another use, which has been deemed 
important by our clinical experts, is that such an esti¬ 
mate can help doctors maintain an approximate view 
of a patient’s underlying physiology during artifactual 
events that would otherwise completely obscure a pa¬ 
tient’s vital signs. This can be crucial during treat¬ 
ment of a patient under critical conditions, such as 
the ones found in an ICU. 
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Figure 6: Example of the inferred underlying physiol¬ 
ogy in the presence of a blood sample in the case of the 
DSLDS (top) and the FSLDS (bottom). The solid line 
corresponds to the actual observations, while the esti¬ 
mated true physiology is plotted as a dashed line with 
the shaded area indicating two standard deviations. 


3 Discussion 

We have presented a discriminative approach for the 
very important application of patient monitoring in 
ICUs. We show that our new approach is able to 
outperform the previous generative approach used for 
the same task in most of the investigated cases. We 
also show that an a-mixture of the two approaches 
yields better results than either model separately. In 
our approach we have assumed that the prediction of 

























































the switching variable factorises over the state space. 
However, one could use a structured output model to 
predict the joint distribution of different factors. 

Finally, another issue is the lack of explicit temporal 
continuity in the s-chain. Implicitly, this is handled by 
the feature construction process. However, a future di¬ 
rection could be to establish a Markovian connection 
on the s-chain too and compare with our current ap¬ 
proach. 
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